Optimization of SAD Algorithm on VLIW DSP

نویسندگان

  • Hui - Jae You
  • Sun - Tae Chung
  • Souhwan Jung
چکیده

SAD (Sum of Absolute Difference) algorithm is heavily used in motion estimation which is computationally highly demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an efficient implementation of SAD algorithm on the VLIW processor is essential. SAD algorithm is programmed as a nested loop with a conditional branch. In VLIW processors, loop is usually optimized by software pipelining, but researches on optimal scheduling of software pipelining for nested loops, especially nested loops with conditional branches are rare. In this paper, we propose an optimal scheduling and implementation of SAD algorithm with conditional branch on a VLIW DSP processor. The proposed optimal scheduling first transforms the nested loop with conditional branch into a single loop with conditional branch with consideration of full utilization of ILP capability of the VLIW processor and realization of earlier escape from the loop. Next, the proposed optimal scheduling applies a modulo scheduling technique developed for single loop. Based on this optimal scheduling strategy, optimal implementation of SAD algorithm on TMS320C67x, a VLIW DSP is presented. Through experiments on TMS320C6713 DSK, it is shown that H.263 encoder with the proposed SAD implementation performs better than other H.263 encoder with other SAD implementations, and that the code size of the optimal SAD implementation is small enough to be appropriate for embedded environments. Keywords—Optimal implementation, SAD algorithm, VLIW, TMS320C6713.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preferred Strategies for Optimizing Convolution on VLIW DSP Architectures

1. Abstract Convolution is a central algorithm for implementing linear time invariant systems that constitute the heart of most digital signal processing algorithms. Performance on the linear convolution algorithm has been one of the primary benchmarks used to discern the performance of dedicated digital signal processing architectures (DSP). While DSP benchmarks are far more varied and complex...

متن کامل

Copy Propagation Optimizations for VLIW DSP Processors with Distributed Register Files

High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register files. This presents new challenges for devising com...

متن کامل

An Advanced Compiler Designed for a VLIW DSP for Sensors-Based Systems

The VLIW architecture can be exploited to greatly enhance instruction level parallelism, thus it can provide computation power and energy efficiency advantages, which satisfies the requirements of future sensor-based systems. However, as VLIW codes are mainly compiled statically, the performance of a VLIW processor is dominated by the behavior of its compiler. In this paper, we present an advan...

متن کامل

Optimization of FIR filter implementation for FMT on VLIW DSP

The paper summarizes the FMT modulation prototype filter design and its efficient implementation on DSP. The optimum design of algorithms for digital signal processors with VLIW architecture is described. Using this new approach it was, for example, possible to optimize compilation from the C language into the assembler of TMS320C6414 digital signal processor for implementation of FMT modulatio...

متن کامل

LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with non-uniform register files

Embedded processors developed within the past few years have employed novel hardware designs to reduce the ever-growing complexity, power dissipation, and die area. Although using a distributed register file architecture is considered to have less read/write ports than using traditional unified register file structures, it presents challenges in compilation techniques to generate efficient code...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012